I–vector transformation and scaling for PLDA based speaker recognition
نویسندگان
چکیده
This paper proposes a density model transformation for speaker recognition systems based on i–vectors and Probabilistic Linear Discriminant Analysis (PLDA) classification. The PLDA model assumes that the i-vectors are distributed according to the standard normal distribution, whereas it is well known that this is not the case. Experiments have shown that the i–vector are better modeled, for example, by a Heavy–Tailed distribution, and that significant improvement of the classification performance can be obtained by whitening and length normalizing the i-vectors. In this work we propose to transform the i–vectors, extracted ignoring the classifier that will be used, so that their distribution becomes more suitable to discriminate speakers using PLDA. This is performed by means of a sequence of affine and non–linear transformations whose parameters are obtained by Maximum Likelihood (ML) estimation on the training set. The second contribution of this work is the reduction of the mismatch between the development and test i–vector distributions by means of a scaling factor tuned for the estimated i– vector distribution, rather than by means of a blind length normalization. Our tests performed on the NIST SRE-2010 and SRE-2012 evaluation sets show that improvement of their Cost Functions of the order of 10% can be obtained for both evaluation data.
منابع مشابه
PLDA Modeling in I-Vector and Supervector Space for Speaker Verification
In this paper, we advocate the use of uncompressed form of ivector. We employ the probabilistic linear discriminant analysis (PLDA) to handle speaker and session variability for speaker verification task. An i-vector is a low-dimensional vector containing both speaker and channel information acquired from a speech segment. When PLDA is used on i-vector, dimension reduction is performed twice – ...
متن کاملFrom Features to Speaker Vectors by means of Restricted Boltzmann Machine Adaptation
Restricted Boltzmann Machines (RBMs) have shown success in different stages of speaker recognition systems. In this paper, we propose a novel framework to produce a vector-based representation for each speaker, which will be referred to as RBMvector. This new approach maps the speaker spectral features to a single fixed-dimensional vector carrying speaker-specific information. In this work, a g...
متن کاملSTC Speaker Recognition System for the NIST i-Vector Challenge
This paper presents a Speech Technology Center (STC) system submitted to the NIST i-vector Challenge. The system includes different subsystems based on PLDA, LDA-SVM, RBM-PLDA and DBN-PLDA. We propose an original iterative scheme for clustering the NIST i-vector Challenge devset. We also introduce the RBM-PLDA subsystem in the NIST i-vector Challenge. Experiments performed on the progress datas...
متن کاملPLDA in the I-Supervector Space for Text-Independent Speaker Verification
In this paper, we advocate the use of the uncompressed form of i-vector and depend on subspace modeling using probabilistic linear discriminant analysis (PLDA) in handling the speaker and session (or channel) variability. An i-vector is a low-dimensional vector containing both speaker and channel information acquired from a speech segment. When PLDA is used on an i-vector, dimension reduction i...
متن کاملI-Vector/PLDA Variants for Text-Dependent Speaker Recognition
The i-vector/PLDA approach currently dominates the field of text-independent speaker recognition and the question of how to translate this methodology to the text-dependent domain has recently become an active area of research. The essential difference between the two fields is that it is possible to do speaker recognition with enrollment and test utterances of very short duration in the text-d...
متن کامل